FreClu: Efficient Frequency-based De novo Short Read Clustering -- preparation for input data
Please try Java VM options -Xms and -Xmx to tune the heap size.
E.g. java -Xms10G -Xmx10G Radix_DNAseq
Note: One has to compile the java files (javac) before running the program below.
javac QV_format1.java
javac QV_format1_qseq.java
javac QV_format1_fastq.java
javac QV_filter.java
javac QV_format2.java
javac ChangeSolexaQVCountTagToFASTA.java
javac OverlapWithoutGap.java
javac AlignmentMain.java
javac FormatOverlap.java
javac Merge_randomModel.java
javac Radix_DNAseq.java
javac Merge.java
javac Radix_num.java
Usage:
Five steps are included as below.
(1) REQUIRED :
(1)-1. To join all the raw sequence files Illumina <*_seq.txt > and their QV files Illumina <*_prb.txt >, and remove sequence which has ambiguous base N. The output file will be named as <*_seq-prb.txt>.
java QV_format1 <*_seq.txt>
OR
(1)-2. For Illumina <*qseq.txt> format raw sequence files which have merged sequences and QVs.
java QV_format1_qseq <*_qseq.txt>
OR
(1)-3. For Illumina <*fastq> format raw sequence files which have merged sequences and QVs.
java QV_format1_fastq <*fastq>
(2) OPTIONAL : For output file <*_seq-prb.txt> of (1), set a QV filter to trim apparently low quality reads; What we used was, at most 4 of the first 20 bases of a read were allowed to have QV < 9.
java QV_filter